Learnable pooling with Context Gating for video classification
نویسندگان
چکیده
Common video representations often deploy an average or maximum pooling of pre-extracted frame features over time. Such an approach provides a simple means to encode feature distributions, but is likely to be suboptimal. As an alternative, we here explore combinations of learnable pooling techniques such as Soft Bag-of-words, Fisher Vectors, NetVLAD, GRU and LSTM to aggregate video features over time. We also introduce a learnable non-linear network unit, named Context Gating, aiming at modeling interdependencies between features. We evaluate the method on the multi-modal Youtube-8M Large-Scale Video Understanding dataset using pre-extracted visual and audio features. We demonstrate improvements provided by the Context Gating as well as by the combination of learnable pooling methods. We finally show how this leads to the best performance, out of more than 600 teams, in the Kaggle Youtube-8M Large-Scale Video Understanding challenge.
منابع مشابه
A Convolutional Neural Network based on Adaptive Pooling for Classification of Noisy Images
Convolutional neural network is one of the effective methods for classifying images that performs learning using convolutional, pooling and fully-connected layers. All kinds of noise disrupt the operation of this network. Noise images reduce classification accuracy and increase convolutional neural network training time. Noise is an unwanted signal that destroys the original signal. Noise chang...
متن کاملSecond-order Temporal Pooling for Action Recognition
Most successful deep learning models for action recognition generate predictions for short video clips, which are later aggregated into a longer time-frame action descriptor by computing a statistic over these predictions. Zeroth (max) or first order (average) statistic are commonly used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel e...
متن کاملLearning Region Features for Object Detection
While most steps in the modern object detection methods are learnable, the region feature extraction step remains largely handcrafted, featured by RoI pooling methods. This work proposes a general viewpoint that unifies existing region feature extraction methods and a novel method that is end-to-end learnable. The proposed method removes most heuristic choices and outperforms its RoI pooling co...
متن کاملA Fully Trainable Network with RNN-based Pooling
Pooling is an important component in convolutional neural networks (CNNs) for aggregating features and reducing computational burden. Compared with other components such as convolutional layers and fully connected layers which are completely learned from data, the pooling component is still handcrafted such as max pooling and average pooling. This paper proposes a learnable pooling function usi...
متن کاملLearnable Pooling Regions for Image Classification
Biologically inspired, from the early HMAX model to Spatial Pyramid Matching, pooling has played an important role in visual recognition pipelines. Spatial pooling, by grouping of local codes, equips these methods with a certain degree of robustness to translation and deformation yet preserving important spatial information. Despite the predominance of this approach in current recognition syste...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1706.06905 شماره
صفحات -
تاریخ انتشار 2017